Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading empty datasets and attributes #67

Merged
merged 1 commit into from
Mar 14, 2024

Conversation

axelboc
Copy link
Contributor

@axelboc axelboc commented Feb 5, 2024

Currently, when encountering an empty dataset (e.g. with h5py: grp.create_dataset("empty", dtype=np.float32)) or an empty attribute (e.g. with h5py: grp.attrs["empty"] = h5py.Empty(np.float32)), h5wasm parses the shape as [] and returns the value undefined.

Two problems with this:

  • The empty array shape is also used for scalars.
  • The undefined value can be limiting - e.g. if you put the value in a state once you've read it, then it's difficult to know when the value has actually been read.

With this PR, h5wasm now returns null shapes and null values for empty datasets and attributes.

Note that I'm only adding support for reading empty datasets and attributes, not for writing them.

to_array(): string | number | boolean | JSONCompatibleOutputData[];
get value(): OutputData | null;
get json_value(): JSONCompatibleOutputData | null;
to_array(): JSONCompatibleOutputData | null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding | null to OutputData and JSONCompatibleOutputData would have been quite challenging because the types of recursive. I opted for this simpler typing solution, which also avoids dealing with null in deeper internal functions (e.g. process_data).

Note also that I fixed a few type inference issues (string | number | boolean | JSONCompatibleOutputData[] is the same as JSONCompatibleOutputData).

src/hdf5_hl.ts Outdated
let known_type = true;
// let length: number;
if (type === Module.H5T_class_t.H5T_STRING.value) {
if (metadata.vlen) {
let output = [];
let output: string[] = [];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more recent version of TypeScript in VS Code infers this as never[] and shows an error on the line with output.push() below. There were a few cases like this.

@@ -153,9 +158,9 @@ function process_data(data: Uint8Array, metadata: Metadata, json_compatible: boo
else if (type === Module.H5T_class_t.H5T_COMPOUND.value) {
const { size, compound_type } = <{size: Metadata["size"], compound_type: CompoundTypeMetadata}>metadata;
let n = Math.floor(data.byteLength / size);
let output = [];
let output: (OutputData | JSONCompatibleOutputData)[] = [];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite accurate (should be OutputData[] | JSONCompatibleOutputData[]) but sufficient to get rid of the error.

@@ -168,7 +173,7 @@ function process_data(data: Uint8Array, metadata: Metadata, json_compatible: boo

else if (type === Module.H5T_class_t.H5T_ARRAY.value) {
const { array_type } = <{array_type: Metadata}>metadata;
shape = shape.concat(array_type.shape);
shape = (<number[]>shape).concat(<number[]>array_type.shape);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_data should never be called with a null shape, and the array_type metadata should never be parsed with a null. Could check this with proper runtime assertions, but didn't seem worth it.

const { json_value, metadata } = this;
const { shape } = metadata;
if (!isIterable(json_value) || typeof json_value === "string") {
return json_value;
}
return create_nested_array(json_value, shape);
return create_nested_array(json_value, <number[]>shape);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the attribute is empty, then json_value is null (i.e. not iterable).

@@ -529,27 +534,27 @@ export class Attribute {
this.shape = metadata.shape;
}

get value() {
get value(): OutputData | null {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just being a bit more explicit (here and on a few other methods). It helps double check that the implementation of the method is typed correctly (notably get_attr in this case).

@@ -19,14 +19,14 @@ export interface H5T_class_t {

export interface Metadata {
array_type?: Metadata,
chunks: Array<number> | null,
chunks: number[] | null,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor stylistic change.

@bmaranville
Copy link
Member

This is great! Thanks for the PR. I don't have time to merge it right this second but probably later in the week.

@axelboc
Copy link
Contributor Author

axelboc commented Feb 5, 2024

Thanks, no rush at all! 😉

Copy link
Member

@bmaranville bmaranville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the tests seem to pass, and the code changes look good, so I'll merge it now. Thanks for the PR!

@bmaranville bmaranville merged commit 5f92764 into usnistgov:main Mar 14, 2024
1 check passed
@axelboc axelboc deleted the empty branch March 14, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants